Goto

Collaborating Authors

 canonical vector


mvlearnR and Shiny App for multiview learning

arXiv.org Artificial Intelligence

The package mvlearnR and accompanying Shiny App is intended for integrating data from multiple sources or views or modalities (e.g. genomics, proteomics, clinical and demographic data). Most existing software packages for multiview learning are decentralized and offer limited capabilities, making it difficult for users to perform comprehensive integrative analysis. The new package wraps statistical and machine learning methods and graphical tools, providing a convenient and easy data integration workflow. For users with limited programming language, we provide a Shiny Application to facilitate data integration anywhere and on any device. The methods have potential to offer deeper insights into complex disease mechanisms. Availability and Implementation: mvlearnR is available from the following GitHub repository: https://github.com/lasandrall/mvlearnR. The web application is hosted on shinyapps.io and available at: https://multi-viewlearn.shinyapps.io/MultiView_Modeling/


Tensor Generalized Canonical Correlation Analysis

arXiv.org Artificial Intelligence

Regularized Generalized Canonical Correlation Analysis (RGCCA) is a general statistical framework for multi-block data analysis. RGCCA enables deciphering relationships between several sets of variables and subsumes many well-known multivariate analysis methods as special cases. However, RGCCA only deals with vector-valued blocks, disregarding their possible higher-order structures. This paper presents Tensor GCCA (TGCCA), a new method for analyzing higher-order tensors with canonical vectors admitting an orthogonal rank-R CP decomposition. Moreover, two algorithms for TGCCA, based on whether a separable covariance structure is imposed or not, are presented along with convergence guarantees. The efficiency and usefulness of TGCCA are evaluated on simulated and real data and compared favorably to state-of-the-art approaches.


Deep Gated Canonical Correlation Analysis

arXiv.org Machine Learning

Canonical Correlation Analysis (CCA) models can extract informative correlated representations from multimodal unlabelled data. Despite their success, CCA models may break if the number of variables exceeds the number of samples. We propose Deep Gated-CCA, a method for learning correlated representations based on a sparse subset of variables from two observed modalities. The proposed procedure learns two non-linear transformations and simultaneously gates the input variables to identify a subset of most correlated variables. The non-linear transformations are learned by training two neural networks to maximize a shared correlation loss defined based on their outputs. Gating is obtained by adding an approximate $\ell_0$ regularization term applied to the input variables. This approximation relies on a recently proposed continuous Gaussian based relaxation for Bernoulli variables which act as gates. We demonstrate the efficacy of the method using several synthetic and real examples. Most notably, the method outperforms other linear and non-linear CCA models.


Canonical Correlation Analysis (CCA) Based Multi-View Learning: An Overview

arXiv.org Machine Learning

Multi-view learning (MVL) is a strategy for fusing data from different sources or subsets. Canonical correlation analysis (CCA) is very important in MVL, whose main idea is to map data from different views onto a common space with the maximum correlation. The traditional CCA can only be used to calculate the linear correlation between two views. Moreover, it is unsupervised, and the label information is wasted in supervised learning tasks. Many nonlinear, supervised, or generalized extensions have been proposed to overcome these limitations. However, to our knowledge, there is no up-to-date overview of these approaches. This paper fills this gap, by providing a comprehensive overview of many classical and latest CCA approaches, and describing their typical applications in pattern recognition, multi-modal retrieval and classification, and multi-view embedding.


Canonical Correlation Analysis of Datasets with a Common Source Graph

arXiv.org Machine Learning

Canonical correlation analysis (CCA) is a powerful technique for discovering whether or not hidden sources are commonly present in two (or more) datasets. Its well-appreciated merits include dimensionality reduction, clustering, classification, feature selection, and data fusion. The standard CCA however, does not exploit the geometry of the common sources, which may be available from the given data or can be deduced from (cross-) correlations. In this paper, this extra information provided by the common sources generating the data is encoded in a graph, and is invoked as a graph regularizer. This leads to a novel graph-regularized CCA approach, that is termed graph (g) CCA. The novel gCCA accounts for the graph-induced knowledge of common sources, while minimizing the distance between the wanted canonical variables. Tailored for diverse practical settings where the number of data is smaller than the data vector dimensions, the dual formulation of gCCA is also developed. One such setting includes kernels that are incorporated to account for nonlinear data dependencies. The resultant graph-kernel (gk) CCA is also obtained in closed form. Finally, corroborating image classification tests over several real datasets are presented to showcase the merits of the novel linear, dual, and kernel approaches relative to competing alternatives.


FDR-Corrected Sparse Canonical Correlation Analysis with Applications to Imaging Genomics

arXiv.org Machine Learning

Abstract--Reducing the number of false positive discoveries is presently one of the most pressing issues in the life sciences. It is of especially great importance for many applications in neuroimag-ing and genomics, where datasets are typically high-dimensional, which means that the number of explanatory variables exceeds the sample size. The false discovery rate (FDR) is a criterion that can be employed to address that issue. Thus it has gained great popularity as a tool for testing multiple hypotheses. Canonical correlation analysis (CCA) is a statistical technique that is used to make sense of the cross-correlation of two sets of measurements collected on the same set of samples (e.g., brain imaging and genomic data for the same mental illness patients), and sparse CCA extends the classical method to high-dimensional settings. Here we propose a way of applying the FDR concept to sparse CCA, and a method to control the FDR. The proposed FDR correction directly influences the sparsity of the solution, adapting it to the unknown true sparsity level. Theoretical derivation as well as simulation studies show that our procedure indeed keeps the FDR of the canonical vectors below a user-specified target level. We apply the proposed method to an imaging genomics dataset from the Philadelphia Neurodevelopmental Cohort. Our results link the brain connectivity profiles derived from brain activity during an emotion identification task, as measured by functional magnetic resonance imaging (fMRI), to the corresponding subjects' genomic data. ANONICAL correlation analysis (due to Hotelling, [1]), or CCA, is a classical statistical technique, which is used to make sense of the cross-correlation of two sets of measurements collected on the same set of samples. More precisely, given two sets of random variables, CCA identifies linear combinations of each, which have maximum correlation with each other. The coefficients of these linear combinations of features are called canonical vectors. Like many classical statistical techniques, CCA fails in high-dimensional settings, when the number of variables in either of the two cross-correlated datasets exceeds the number of samples.


Sparse Weighted Canonical Correlation Analysis

arXiv.org Machine Learning

Given two data matrices $X$ and $Y$, sparse canonical correlation analysis (SCCA) is to seek two sparse canonical vectors $u$ and $v$ to maximize the correlation between $Xu$ and $Yv$. However, classical and sparse CCA models consider the contribution of all the samples of data matrices and thus cannot identify an underlying specific subset of samples. To this end, we propose a novel sparse weighted canonical correlation analysis (SWCCA), where weights are used for regularizing different samples. We solve the $L_0$-regularized SWCCA ($L_0$-SWCCA) using an alternating iterative algorithm. We apply $L_0$-SWCCA to synthetic data and real-world data to demonstrate its effectiveness and superiority compared to related methods. Lastly, we consider also SWCCA with different penalties like LASSO (Least absolute shrinkage and selection operator) and Group LASSO, and extend it for integrating more than three data matrices.


Sparse canonical correlation analysis

arXiv.org Machine Learning

Canonical correlation analysis was proposed by Hotelling [6] and it measures linear relationship between two multidimensional variables. In high dimensional setting, the classical canonical correlation analysis breaks down. We propose a sparse canonical correlation analysis by adding l1 constraints on the canonical vectors and show how to solve it efficiently using linearized alternating direction method of multipliers (ADMM) and using TFOCS as a black box. We illustrate this idea on simulated data.


A simple and provable algorithm for sparse diagonal CCA

arXiv.org Machine Learning

Given two sets of variables, derived from a common set of samples, sparse Canonical Correlation Analysis (CCA) seeks linear combinations of a small number of variables in each set, such that the induced canonical variables are maximally correlated. Sparse CCA is NP-hard. We propose a novel combinatorial algorithm for sparse diagonal CCA, i.e., sparse CCA under the additional assumption that variables within each set are standardized and uncorrelated. Our algorithm operates on a low rank approximation of the input data and its computational complexity scales linearly with the number of input variables. It is simple to implement, and parallelizable. In contrast to most existing approaches, our algorithm administers precise control on the sparsity of the extracted canonical vectors, and comes with theoretical data-dependent global approximation guarantees, that hinge on the spectrum of the input data. Finally, it can be straightforwardly adapted to other constrained variants of CCA enforcing structure beyond sparsity. We empirically evaluate the proposed scheme and apply it on a real neuroimaging dataset to investigate associations between brain activity and behavior measurements.


Canonical Divergence Analysis

arXiv.org Machine Learning

We aim to analyze the relation between two random vectors that may potentially have both different number of attributes as well as realizations, and which may even not have a joint distribution. This problem arises in many practical domains, including biology and architecture. Existing techniques assume the vectors to have the same domain or to be jointly distributed, and hence are not applicable. To address this, we propose Canonical Divergence Analysis (CDA). We introduce three instantiations, each of which permits practical implementation. Extensive empirical evaluation shows the potential of our method.